Overview

Dataset statistics

Number of variables19
Number of observations40395
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory16.5 MiB
Average record size in memory429.3 B

Variable types

NUM8
BOOL6
CAT5

Warnings

city_name has a high cardinality: 1046 distinct values High cardinality
region is highly correlated with provinceHigh correlation
province is highly correlated with regionHigh correlation
surface_of_the_land is highly skewed (γ1 = 53.15034165) Skewed
df_index has unique values Unique
surface_of_the_land has 20751 (51.4%) zeros Zeros

Reproduction

Analysis started2020-09-18 11:55:20.177329
Analysis finished2020-09-18 11:55:44.289870
Duration24.11 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct40395
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25808.86065
Minimum0
Maximum52075
Zeros1
Zeros (%)< 0.1%
Memory size315.7 KiB

Quantile statistics

Minimum0
5-th percentile2572.7
Q112715.5
median25666
Q338836
95-th percentile49437.3
Maximum52075
Range52075
Interquartile range (IQR)26120.5

Descriptive statistics

Standard deviation15083.3038
Coefficient of variation (CV)0.5844234663
Kurtosis-1.208587665
Mean25808.86065
Median Absolute Deviation (MAD)13087
Skewness0.01817640765
Sum1042548926
Variance227506053.6
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
117271< 0.1%
 
219841< 0.1%
 
424621< 0.1%
 
486051< 0.1%
 
363151< 0.1%
 
342661< 0.1%
 
404091< 0.1%
 
383601< 0.1%
 
96781< 0.1%
 
199391< 0.1%
 
158211< 0.1%
 
137721< 0.1%
 
35311< 0.1%
 
14821< 0.1%
 
76251< 0.1%
 
55761< 0.1%
 
260541< 0.1%
 
240331< 0.1%
 
322291< 0.1%
 
261181< 0.1%
 
342981< 0.1%
 
302121< 0.1%
 
179221< 0.1%
 
240651< 0.1%
 
Other values (40370)4037099.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
61< 0.1%
 
71< 0.1%
 
81< 0.1%
 
91< 0.1%
 
111< 0.1%
 
ValueCountFrequency (%) 
520751< 0.1%
 
520731< 0.1%
 
520721< 0.1%
 
520711< 0.1%
 
520701< 0.1%
 
520681< 0.1%
 
520671< 0.1%
 
520651< 0.1%
 
520641< 0.1%
 
520631< 0.1%
 

postal_code
Real number (ℝ≥0)

Distinct1057
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5195.044139
Minimum1000
Maximum9992
Zeros0
Zeros (%)0.0%
Memory size315.7 KiB

Quantile statistics

Minimum1000
5-th percentile1080
Q12360
median4630
Q38400
95-th percentile9420
Maximum9992
Range8992
Interquartile range (IQR)6040

Descriptive statistics

Standard deviation2979.185308
Coefficient of variation (CV)0.5734667942
Kurtosis-1.517977446
Mean5195.044139
Median Absolute Deviation (MAD)2845
Skewness0.08975300772
Sum209853808
Variance8875545.1
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
83007501.9%
 
84006851.7%
 
90006341.6%
 
11804981.2%
 
10004511.1%
 
83704471.1%
 
40003941.0%
 
86703650.9%
 
20003230.8%
 
10503230.8%
 
10703010.7%
 
10303000.7%
 
93002890.7%
 
86202850.7%
 
35002790.7%
 
10802690.7%
 
80002670.7%
 
23002590.6%
 
21002580.6%
 
20182480.6%
 
88002350.6%
 
86602320.6%
 
91002190.5%
 
40202180.5%
 
84302170.5%
 
Other values (1032)3164978.3%
 
ValueCountFrequency (%) 
10004511.1%
 
10201230.3%
 
10303000.7%
 
10401450.4%
 
10503230.8%
 
10601290.3%
 
10703010.7%
 
10802690.7%
 
1081640.2%
 
1082570.1%
 
ValueCountFrequency (%) 
99925< 0.1%
 
999114< 0.1%
 
9990490.1%
 
998810< 0.1%
 
99822< 0.1%
 
99813< 0.1%
 
99804< 0.1%
 
99713< 0.1%
 
99706< 0.1%
 
996815< 0.1%
 

city_name
Categorical

HIGH CARDINALITY

Distinct1046
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
Antwerpen
 
886
Knokke
 
750
Oostende
 
685
Gent
 
634
Uccle
 
498
Other values (1041)
36942 
ValueCountFrequency (%) 
Antwerpen8862.2%
 
Knokke7501.9%
 
Oostende6851.7%
 
Gent6341.6%
 
Uccle4981.2%
 
Bruxelles4511.1%
 
Uitkerke4471.1%
 
Glain3941.0%
 
Wulpen3650.9%
 
Ixelles3230.8%
 
Deurne3090.8%
 
Anderlecht3010.7%
 
Schaerbeek3000.7%
 
Aalst2890.7%
 
Nieuwpoort2850.7%
 
Hasselt2790.7%
 
Molenbeek-Saint-Jean2690.7%
 
Brugge2670.7%
 
Turnhout2590.6%
 
Beveren2580.6%
 
De Panne2320.6%
 
Nieuwkerken-Waas2190.5%
 
Liège2180.5%
 
Middelkerke2170.5%
 
Renaix2130.5%
 
Other values (1021)3104776.9%
 
Frequencies of value counts

Unique

Unique84 ?
Unique (%)0.2%
Histogram of lengths of the category

Length

Max length30
Median length8
Mean length8.565614556
Min length2

Overview of Unicode Properties

Unique unicode characters62
Unique unicode categories5 ?
Unique unicode scripts2 ?
Unique unicode blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e6391118.5%
 
n266667.7%
 
r238966.9%
 
a187035.4%
 
l186555.4%
 
o164024.7%
 
i163994.7%
 
t159784.6%
 
s148114.3%
 
u111323.2%
 
k88922.6%
 
-84182.4%
 
m75372.2%
 
g64891.9%
 
B60271.7%
 
d59651.7%
 
h51981.5%
 
b46661.3%
 
c44501.3%
 
p43921.3%
 
L40051.2%
 
S37981.1%
 
A37961.1%
 
M37061.1%
 
G34261.0%
 
Other values (37)3869011.2%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter28741783.1%
 
Uppercase Letter4929414.2%
 
Dash Punctuation84182.4%
 
Space Separator5170.1%
 
Other Punctuation3620.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
B602712.2%
 
L40058.1%
 
S37987.7%
 
A37967.7%
 
M37067.5%
 
G34267.0%
 
H33436.8%
 
W28925.9%
 
E19534.0%
 
K19524.0%
 
O18163.7%
 
D17233.5%
 
N14893.0%
 
T13412.7%
 
P12522.5%
 
R11322.3%
 
C10232.1%
 
U9451.9%
 
J9441.9%
 
F8751.8%
 
Z6421.3%
 
I6091.2%
 
V5631.1%
 
Q330.1%
 
À6< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e6391122.2%
 
n266669.3%
 
r238968.3%
 
a187036.5%
 
l186556.5%
 
o164025.7%
 
i163995.7%
 
t159785.6%
 
s148115.2%
 
u111323.9%
 
k88923.1%
 
m75372.6%
 
g64892.3%
 
d59652.1%
 
h51981.8%
 
b46661.6%
 
c44501.5%
 
p43921.5%
 
v31851.1%
 
w26010.9%
 
x17880.6%
 
z13730.5%
 
j10360.4%
 
f8350.3%
 
y6820.2%
 
Other values (8)17750.6%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-8418100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
'362100.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
517100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin33671197.3%
 
Common92972.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e6391119.0%
 
n266667.9%
 
r238967.1%
 
a187035.6%
 
l186555.5%
 
o164024.9%
 
i163994.9%
 
t159784.7%
 
s148114.4%
 
u111323.3%
 
k88922.6%
 
m75372.2%
 
g64891.9%
 
B60271.8%
 
d59651.8%
 
h51981.5%
 
b46661.4%
 
c44501.3%
 
p43921.3%
 
L40051.2%
 
S37981.1%
 
A37961.1%
 
M37061.1%
 
G34261.0%
 
H33431.0%
 
Other values (34)3446810.2%
 

Most frequent Common characters

ValueCountFrequency (%) 
-841890.5%
 
5175.6%
 
'3623.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII34452399.6%
 
None14850.4%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e6391118.6%
 
n266667.7%
 
r238966.9%
 
a187035.4%
 
l186555.4%
 
o164024.8%
 
i163994.8%
 
t159784.6%
 
s148114.3%
 
u111323.2%
 
k88922.6%
 
-84182.4%
 
m75372.2%
 
g64891.9%
 
B60271.7%
 
d59651.7%
 
h51981.5%
 
b46661.4%
 
c44501.3%
 
p43921.3%
 
L40051.2%
 
S37981.1%
 
A37961.1%
 
M37061.1%
 
G34261.0%
 
Other values (29)3720510.8%
 

Most frequent None characters

ValueCountFrequency (%) 
é68145.9%
 
è54937.0%
 
ê896.0%
 
â745.0%
 
ô674.5%
 
ë100.7%
 
à90.6%
 
À60.4%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
0
21440 
1
18955 
ValueCountFrequency (%) 
02144053.1%
 
11895546.9%
 

price
Real number (ℝ≥0)

Distinct3517
Distinct (%)8.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean314114.6616
Minimum2500
Maximum950000
Zeros0
Zeros (%)0.0%
Memory size315.7 KiB

Quantile statistics

Minimum2500
5-th percentile120000
Q1199000
median275000
Q3379000
95-th percentile680000
Maximum950000
Range947500
Interquartile range (IQR)180000

Descriptive statistics

Standard deviation168151.6724
Coefficient of variation (CV)0.5353194006
Kurtosis1.949629927
Mean314114.6616
Median Absolute Deviation (MAD)85000
Skewness1.370488373
Sum1.268866176e+10
Variance2.827498492e+10
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2490005561.4%
 
1990005511.4%
 
2990005451.3%
 
2250005211.3%
 
2950005201.3%
 
2750005171.3%
 
3250004281.1%
 
1750004171.0%
 
2350004151.0%
 
1950004111.0%
 
3950004091.0%
 
1850004021.0%
 
2650003991.0%
 
2450003891.0%
 
2500003871.0%
 
2850003710.9%
 
3490003690.9%
 
2150003540.9%
 
1650003380.8%
 
2390003350.8%
 
3500003270.8%
 
2690003260.8%
 
2200003140.8%
 
1790003130.8%
 
2290003120.8%
 
Other values (3492)3016974.7%
 
ValueCountFrequency (%) 
25003< 0.1%
 
66001< 0.1%
 
81601< 0.1%
 
99991< 0.1%
 
100004< 0.1%
 
118251< 0.1%
 
125001< 0.1%
 
145001< 0.1%
 
150006< 0.1%
 
190001< 0.1%
 
ValueCountFrequency (%) 
950000700.2%
 
9490008< 0.1%
 
9480002< 0.1%
 
9470003< 0.1%
 
945000320.1%
 
9400007< 0.1%
 
9390001< 0.1%
 
9360001< 0.1%
 
9350003< 0.1%
 
9300009< 0.1%
 

number_of_rooms
Real number (ℝ≥0)

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.813838346
Minimum1
Maximum18
Zeros0
Zeros (%)0.0%
Memory size315.7 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q33
95-th percentile5
Maximum18
Range17
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.26096777
Coefficient of variation (CV)0.4481308502
Kurtosis6.883403348
Mean2.813838346
Median Absolute Deviation (MAD)1
Skewness1.578245584
Sum113665
Variance1.590039718
MonotocityNot monotonic
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%) 
21384734.3%
 
31336733.1%
 
4574714.2%
 
1414510.3%
 
520355.0%
 
67892.0%
 
72280.6%
 
81040.3%
 
9440.1%
 
10420.1%
 
1120< 0.1%
 
1212< 0.1%
 
134< 0.1%
 
154< 0.1%
 
163< 0.1%
 
143< 0.1%
 
181< 0.1%
 
ValueCountFrequency (%) 
1414510.3%
 
21384734.3%
 
31336733.1%
 
4574714.2%
 
520355.0%
 
67892.0%
 
72280.6%
 
81040.3%
 
9440.1%
 
10420.1%
 
ValueCountFrequency (%) 
181< 0.1%
 
163< 0.1%
 
154< 0.1%
 
143< 0.1%
 
134< 0.1%
 
1212< 0.1%
 
1120< 0.1%
 
10420.1%
 
9440.1%
 
81040.3%
 

house_area
Real number (ℝ≥0)

Distinct657
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean152.4663201
Minimum5
Maximum3560
Zeros0
Zeros (%)0.0%
Memory size315.7 KiB

Quantile statistics

Minimum5
5-th percentile60
Q192
median130
Q3184
95-th percentile324
Maximum3560
Range3555
Interquartile range (IQR)92

Descriptive statistics

Standard deviation95.64920638
Coefficient of variation (CV)0.6273464614
Kurtosis60.1319554
Mean152.4663201
Median Absolute Deviation (MAD)42
Skewness4.041212374
Sum6158877
Variance9148.770682
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
908922.2%
 
1208902.2%
 
1008762.2%
 
1508122.0%
 
1407461.8%
 
807331.8%
 
1107001.7%
 
1606851.7%
 
2006831.7%
 
1306571.6%
 
856481.6%
 
1805671.4%
 
705501.4%
 
755271.3%
 
955171.3%
 
1254501.1%
 
1704461.1%
 
1154341.1%
 
1054091.0%
 
1353770.9%
 
2203430.8%
 
1453420.8%
 
603410.8%
 
2503270.8%
 
653180.8%
 
Other values (632)2612564.7%
 
ValueCountFrequency (%) 
53< 0.1%
 
111< 0.1%
 
132< 0.1%
 
142< 0.1%
 
152< 0.1%
 
165< 0.1%
 
176< 0.1%
 
18220.1%
 
192< 0.1%
 
209< 0.1%
 
ValueCountFrequency (%) 
35601< 0.1%
 
20191< 0.1%
 
17001< 0.1%
 
16401< 0.1%
 
15002< 0.1%
 
14611< 0.1%
 
13501< 0.1%
 
13391< 0.1%
 
12002< 0.1%
 
11211< 0.1%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
1
28176 
0
12219 
ValueCountFrequency (%) 
12817669.8%
 
01221930.2%
 

open_fire
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
0
38230 
1
 
2165
ValueCountFrequency (%) 
03823094.6%
 
121655.4%
 

terrace
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
1
25052 
0
15343 
ValueCountFrequency (%) 
12505262.0%
 
01534338.0%
 

garden
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
0
27419 
1
12976 
ValueCountFrequency (%) 
02741967.9%
 
11297632.1%
 

surface_of_the_land
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct2952
Distinct (%)7.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean545.8400792
Minimum0
Maximum400000
Zeros20751
Zeros (%)51.4%
Memory size315.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3416
95-th percentile1840
Maximum400000
Range400000
Interquartile range (IQR)416

Descriptive statistics

Standard deviation3609.242736
Coefficient of variation (CV)6.612271383
Kurtosis4663.468703
Mean545.8400792
Median Absolute Deviation (MAD)0
Skewness53.15034165
Sum22049210
Variance13026633.12
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02075151.4%
 
1501690.4%
 
2001600.4%
 
10001450.4%
 
3001440.4%
 
2501420.4%
 
1001380.3%
 
1201290.3%
 
4001200.3%
 
6001150.3%
 
1801110.3%
 
1301100.3%
 
2101080.3%
 
1401040.3%
 
901020.3%
 
1601010.3%
 
110970.2%
 
70960.2%
 
170950.2%
 
80920.2%
 
500920.2%
 
800900.2%
 
60840.2%
 
220820.2%
 
240800.2%
 
Other values (2927)1693841.9%
 
ValueCountFrequency (%) 
02075151.4%
 
119< 0.1%
 
21< 0.1%
 
41< 0.1%
 
52< 0.1%
 
61< 0.1%
 
72< 0.1%
 
82< 0.1%
 
103< 0.1%
 
123< 0.1%
 
ValueCountFrequency (%) 
4000001< 0.1%
 
2647811< 0.1%
 
1203001< 0.1%
 
1200002< 0.1%
 
1178001< 0.1%
 
991481< 0.1%
 
988221< 0.1%
 
888001< 0.1%
 
876001< 0.1%
 
864352< 0.1%
 
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
2
14531 
0
10360 
4
8104 
3
7400 
ValueCountFrequency (%) 
21453136.0%
 
01036025.6%
 
4810420.1%
 
3740018.3%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
21453136.0%
 
01036025.6%
 
4810420.1%
 
3740018.3%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number40395100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
21453136.0%
 
01036025.6%
 
4810420.1%
 
3740018.3%
 

Most occurring scripts

ValueCountFrequency (%) 
Common40395100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
21453136.0%
 
01036025.6%
 
4810420.1%
 
3740018.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII40395100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
21453136.0%
 
01036025.6%
 
4810420.1%
 
3740018.3%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
0
39699 
1
 
696
ValueCountFrequency (%) 
03969998.3%
 
16961.7%
 
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
as new
12096 
good
10985 
unknown
9796 
to be done up
2789 
to renovate
2441 
Other values (2)
2288 
ValueCountFrequency (%) 
as new1209629.9%
 
good1098527.2%
 
unknown979624.3%
 
to be done up27896.9%
 
to renovate24416.0%
 
just renovated21475.3%
 
to restore1410.3%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length14
Median length6
Mean length6.923233073
Min length4

Overview of Unicode Properties

Unique unicode characters17
Unique unicode categories2 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
n4886117.5%
 
o4465516.0%
 
e271329.7%
 
251929.0%
 
w218927.8%
 
a166846.0%
 
d159215.7%
 
u147325.3%
 
s143845.1%
 
t122474.4%
 
g109853.9%
 
k97963.5%
 
r48701.7%
 
v45881.6%
 
b27891.0%
 
p27891.0%
 
j21470.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter25447291.0%
 
Space Separator251929.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n4886119.2%
 
o4465517.5%
 
e2713210.7%
 
w218928.6%
 
a166846.6%
 
d159216.3%
 
u147325.8%
 
s143845.7%
 
t122474.8%
 
g109854.3%
 
k97963.8%
 
r48701.9%
 
v45881.8%
 
b27891.1%
 
p27891.1%
 
j21470.8%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
25192100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin25447291.0%
 
Common251929.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
n4886119.2%
 
o4465517.5%
 
e2713210.7%
 
w218928.6%
 
a166846.6%
 
d159216.3%
 
u147325.8%
 
s143845.7%
 
t122474.8%
 
g109854.3%
 
k97963.8%
 
r48701.9%
 
v45881.8%
 
b27891.1%
 
p27891.1%
 
j21470.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
25192100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII279664100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
n4886117.5%
 
o4465516.0%
 
e271329.7%
 
251929.0%
 
w218927.8%
 
a166846.0%
 
d159215.7%
 
u147325.3%
 
s143845.1%
 
t122474.4%
 
g109853.9%
 
k97963.5%
 
r48701.7%
 
v45881.6%
 
b27891.0%
 
p27891.0%
 
j21470.8%
 

lattitude
Real number (ℝ≥0)

Distinct1051
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.313450246
Minimum2.580669689
Maximum6.3009381
Zeros0
Zeros (%)0.0%
Memory size315.7 KiB

Quantile statistics

Minimum2.580669689
5-th percentile2.9203275
Q13.7141549
median4.361194615
Q34.849314652
95-th percentile5.622980506
Maximum6.3009381
Range3.720268411
Interquartile range (IQR)1.135159752

Descriptive statistics

Standard deviation0.8119890403
Coefficient of variation (CV)0.1882458343
Kurtosis-0.6577956038
Mean4.313450246
Median Absolute Deviation (MAD)0.5635523147
Skewness-0.07418063984
Sum174241.8227
Variance0.6593262016
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
4.39970818862.2%
 
3.3233738617501.9%
 
2.92032756851.7%
 
3.71415496341.6%
 
4.33723484981.2%
 
4.3516974511.1%
 
3.140486814471.1%
 
5.5418643941.0%
 
2.7073119163650.9%
 
4.38157073230.8%
 
4.31234013010.7%
 
4.37371213000.7%
 
4.039642422890.7%
 
2.728398652850.7%
 
5.3368383972790.7%
 
4.32277792690.7%
 
3.20736112670.7%
 
4.9484612590.6%
 
4.4695254092580.6%
 
3.144162350.6%
 
2.5806696892320.6%
 
4.17802792190.5%
 
5.57342032180.5%
 
2.8063401232170.5%
 
3.60204652130.5%
 
Other values (1026)3112177.0%
 
ValueCountFrequency (%) 
2.5806696892320.6%
 
2.62625887< 0.1%
 
2.64344877420.1%
 
2.6449117152< 0.1%
 
2.6733217< 0.1%
 
2.7073119163650.9%
 
2.722259264800.2%
 
2.722568815< 0.1%
 
2.728398652850.7%
 
2.7409610513< 0.1%
 
ValueCountFrequency (%) 
6.30093812< 0.1%
 
6.26424981< 0.1%
 
6.2578278< 0.1%
 
6.20535735< 0.1%
 
6.18849323< 0.1%
 
6.16514841< 0.1%
 
6.12589538< 0.1%
 
6.12167935812< 0.1%
 
6.1117543280.1%
 
6.11084045< 0.1%
 

longitude
Real number (ℝ≥0)

Distinct1051
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.85386439
Minimum49.5085018
Maximum51.4743516
Zeros0
Zeros (%)0.0%
Memory size315.7 KiB

Quantile statistics

Minimum49.5085018
5-th percentile50.3102184
Q150.6701887
median50.8704524
Q351.1044854
95-th percentile51.2996935
Maximum51.4743516
Range1.9658498
Interquartile range (IQR)0.4342967

Descriptive statistics

Standard deviation0.3251105424
Coefficient of variation (CV)0.006393035148
Kurtosis1.466404874
Mean50.85386439
Median Absolute Deviation (MAD)0.2222474
Skewness-0.9593327094
Sum2054241.852
Variance0.1056968648
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
51.22110978862.2%
 
51.349429657501.9%
 
51.23031776851.7%
 
51.03971296341.6%
 
50.80182014981.2%
 
50.84655734511.1%
 
51.29969354471.1%
 
50.6482053941.0%
 
51.097791753650.9%
 
50.82228543230.8%
 
50.83814113010.7%
 
50.86760413000.7%
 
50.94297552890.7%
 
51.144160052850.7%
 
50.9303582790.7%
 
50.85435512690.7%
 
51.21470832670.7%
 
51.32338122590.6%
 
51.21152842580.6%
 
50.96873122350.6%
 
51.094377752320.6%
 
51.19339082190.5%
 
50.64513812180.5%
 
51.183316952170.5%
 
50.74761922130.5%
 
Other values (1026)3112177.0%
 
ValueCountFrequency (%) 
49.508501810< 0.1%
 
49.557756211< 0.1%
 
49.558079414< 0.1%
 
49.55819252< 0.1%
 
49.564206516< 0.1%
 
49.567529618< 0.1%
 
49.57494796< 0.1%
 
49.5814209320.1%
 
49.590312513< 0.1%
 
49.5969055240.1%
 
ValueCountFrequency (%) 
51.4743516340.1%
 
51.467795710< 0.1%
 
51.460924956< 0.1%
 
51.45063155290.1%
 
51.431558254< 0.1%
 
51.412750434< 0.1%
 
51.41206885210.1%
 
51.3994474230.1%
 
51.39753661< 0.1%
 
51.3957408519< 0.1%
 

province
Categorical

HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
Flandre-Occidentale
7235 
Anvers
5313 
Flandre-Orientale
5102 
Hainaut
4115 
Liège
3936 
Other values (6)
14694 
ValueCountFrequency (%) 
Flandre-Occidentale723517.9%
 
Anvers531313.2%
 
Flandre-Orientale510212.6%
 
Hainaut411510.2%
 
Liège39369.7%
 
Bruxelles-Capitale38369.5%
 
Brabant flamand37959.4%
 
Limbourg25936.4%
 
Brabant wallon17594.4%
 
Namur15643.9%
 
Luxembourg11472.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length19
Median length14
Mean length12.25881916
Min length5

Overview of Unicode Properties

Unique unicode characters31
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a6259712.6%
 
e5891511.9%
 
n452109.1%
 
l434958.8%
 
r374467.6%
 
i268175.4%
 
t258425.2%
 
d233674.7%
 
-161733.3%
 
c144702.9%
 
u144022.9%
 
F123372.5%
 
O123372.5%
 
B93901.9%
 
b92941.9%
 
s91491.8%
 
m90991.8%
 
L76761.6%
 
g76761.6%
 
55541.1%
 
o54991.1%
 
A53131.1%
 
v53131.1%
 
x49831.0%
 
H41150.8%
 
Other values (6)187263.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter41690084.2%
 
Uppercase Letter5656811.4%
 
Dash Punctuation161733.3%
 
Space Separator55541.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
F1233721.8%
 
O1233721.8%
 
B939016.6%
 
L767613.6%
 
A53139.4%
 
H41157.3%
 
C38366.8%
 
N15642.8%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a6259715.0%
 
e5891514.1%
 
n4521010.8%
 
l4349510.4%
 
r374469.0%
 
i268176.4%
 
t258426.2%
 
d233675.6%
 
c144703.5%
 
u144023.5%
 
b92942.2%
 
s91492.2%
 
m90992.2%
 
g76761.8%
 
o54991.3%
 
v53131.3%
 
x49831.2%
 
è39360.9%
 
p38360.9%
 
f37950.9%
 
w17590.4%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-16173100.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
5554100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin47346895.6%
 
Common217274.4%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a6259713.2%
 
e5891512.4%
 
n452109.5%
 
l434959.2%
 
r374467.9%
 
i268175.7%
 
t258425.5%
 
d233674.9%
 
c144703.1%
 
u144023.0%
 
F123372.6%
 
O123372.6%
 
B93902.0%
 
b92942.0%
 
s91491.9%
 
m90991.9%
 
L76761.6%
 
g76761.6%
 
o54991.2%
 
A53131.1%
 
v53131.1%
 
x49831.1%
 
H41150.9%
 
è39360.8%
 
C38360.8%
 
Other values (4)109542.3%
 

Most frequent Common characters

ValueCountFrequency (%) 
-1617374.4%
 
555425.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII49125999.2%
 
None39360.8%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a6259712.7%
 
e5891512.0%
 
n452109.2%
 
l434958.9%
 
r374467.6%
 
i268175.5%
 
t258425.3%
 
d233674.8%
 
-161733.3%
 
c144702.9%
 
u144022.9%
 
F123372.5%
 
O123372.5%
 
B93901.9%
 
b92941.9%
 
s91491.9%
 
m90991.9%
 
L76761.6%
 
g76761.6%
 
55541.1%
 
o54991.1%
 
A53131.1%
 
v53131.1%
 
x49831.0%
 
H41150.8%
 
Other values (5)147903.0%
 

Most frequent None characters

ValueCountFrequency (%) 
è3936100.0%
 

region
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size315.7 KiB
Flandre
24038 
Wallonie
12521 
Bruxelles
3836 
ValueCountFrequency (%) 
Flandre2403859.5%
 
Wallonie1252131.0%
 
Bruxelles38369.5%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length7
Mean length7.4998886
Min length7

Overview of Unicode Properties

Unique unicode characters14
Unique unicode categories2 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
l5675218.7%
 
e4423114.6%
 
a3655912.1%
 
n3655912.1%
 
r278749.2%
 
F240387.9%
 
d240387.9%
 
W125214.1%
 
o125214.1%
 
i125214.1%
 
B38361.3%
 
u38361.3%
 
x38361.3%
 
s38361.3%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter26256386.7%
 
Uppercase Letter4039513.3%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
F2403859.5%
 
W1252131.0%
 
B38369.5%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
l5675221.6%
 
e4423116.8%
 
a3655913.9%
 
n3655913.9%
 
r2787410.6%
 
d240389.2%
 
o125214.8%
 
i125214.8%
 
u38361.5%
 
x38361.5%
 
s38361.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin302958100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
l5675218.7%
 
e4423114.6%
 
a3655912.1%
 
n3655912.1%
 
r278749.2%
 
F240387.9%
 
d240387.9%
 
W125214.1%
 
o125214.1%
 
i125214.1%
 
B38361.3%
 
u38361.3%
 
x38361.3%
 
s38361.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII302958100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
l5675218.7%
 
e4423114.6%
 
a3655912.1%
 
n3655912.1%
 
r278749.2%
 
F240387.9%
 
d240387.9%
 
W125214.1%
 
o125214.1%
 
i125214.1%
 
B38361.3%
 
u38361.3%
 
x38361.3%
 
s38361.3%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

df_indexpostal_codecity_nametype_of_propertypricenumber_of_roomshouse_areafully_equipped_kitchenopen_fireterracegardensurface_of_the_landnumber_of_facadesswimming_poolstate_of_the_buildinglattitudelongitudeprovinceregion
001050Ixelles0340000620310109520to be done up4.38157150.822285Bruxelles-CapitaleBruxelles
111050Ixelles0520000420000006920to renovate4.38157150.822285Bruxelles-CapitaleBruxelles
231050Ixelles05990004160101110020to be done up4.38157150.822285Bruxelles-CapitaleBruxelles
341050Ixelles05990003160101113020good4.38157150.822285Bruxelles-CapitaleBruxelles
451050Ixelles0575000317100004620just renovated4.38157150.822285Bruxelles-CapitaleBruxelles
561050Ixelles059000042250010020to renovate4.38157150.822285Bruxelles-CapitaleBruxelles
671050Ixelles057500042091000020unknown4.38157150.822285Bruxelles-CapitaleBruxelles
781050Ixelles05950001195111161740as new4.38157150.822285Bruxelles-CapitaleBruxelles
891050Ixelles0595777425000007020unknown4.38157150.822285Bruxelles-CapitaleBruxelles
9111050Ixelles0650000625010006020good4.38157150.822285Bruxelles-CapitaleBruxelles

Last rows

df_indexpostal_codecity_nametype_of_propertypricenumber_of_roomshouse_areafully_equipped_kitchenopen_fireterracegardensurface_of_the_landnumber_of_facadesswimming_poolstate_of_the_buildinglattitudelongitudeprovinceregion
40385520634342Hognoul03990004180111168030as new5.45563950.680810LiègeWallonie
40386520644342Hognoul042500033151011030unknown5.45563950.680810LiègeWallonie
40387520657743Obigies039000043401101216440unknown3.36428150.662055HainautWallonie
40388520673050Oud-Heverlee04200005185000146500to be done up4.66789750.821768Brabant flamandFlandre
40389520683050Oud-Heverlee043500042341010030as new4.66789750.821768Brabant flamandFlandre
40390520701472Vieux-Genappe047500052161100155041as new4.40150350.629025Brabant wallonWallonie
40391520711472Vieux-Genappe047500052151010155001good4.40150350.629025Brabant wallonWallonie
40392520721461Haut-Ittre049900052751011156140unknown4.29647250.648804Brabant wallonWallonie
40393520731761Borchtlombeek04950004235100148840unknown4.13691550.848178Brabant flamandFlandre
40394520753381Kapellen048500032200010101940good4.96087850.887345Brabant flamandFlandre